Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Res Sq ; 2024 Mar 11.
Artigo em Inglês | MEDLINE | ID: mdl-38559017

RESUMO

Peptide design, with the goal of identifying peptides possessing unique biological properties, stands as a crucial challenge in peptide-based drug discovery. While traditional and computational methods have made significant strides, they often encounter hurdles due to the complexities and costs of laboratory experiments. Recent advancements in deep learning and Bayesian Optimization have paved the way for innovative research in this domain. In this context, our study presents a novel approach that effectively combines protein structure prediction with Bayesian Optimization for peptide design. By applying carefully designed objective functions, we guide and enhance the optimization trajectory for new peptide sequences. Benchmarked against multiple native structures, our methodology is tailored to generate new peptides to their optimal potential biological properties.

2.
Nat Rev Bioeng ; 2(2): 136-154, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38576453

RESUMO

Denoising diffusion models embody a type of generative artificial intelligence that can be applied in computer vision, natural language processing and bioinformatics. In this Review, we introduce the key concepts and theoretical foundations of three diffusion modelling frameworks (denoising diffusion probabilistic models, noise-conditioned scoring networks and score stochastic differential equations). We then explore their applications in bioinformatics and computational biology, including protein design and generation, drug and small-molecule design, protein-ligand interaction modelling, cryo-electron microscopy image data analysis and single-cell data analysis. Finally, we highlight open-source diffusion model tools and consider the future applications of diffusion models in bioinformatics.

3.
Nucleic Acids Res ; 52(D1): D426-D433, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37933852

RESUMO

The DescribePROT database of amino acid-level descriptors of protein structures and functions was substantially expanded since its release in 2020. This expansion includes substantial increase in the size, scope, and quality of the underlying data, the addition of experimental structural information, the inclusion of new data download options, and an upgraded graphical interface. DescribePROT currently covers 19 structural and functional descriptors for proteins in 273 reference proteomes generated by 11 accurate and complementary predictive tools. Users can search our resource in multiple ways, interact with the data using the graphical interface, and download data at various scales including individual proteins, entire proteomes, and whole database. The annotations in DescribePROT are useful for a broad spectrum of studies that include investigations of protein structure and function, development and validation of predictive tools, and to support efforts in understanding molecular underpinnings of diseases and development of therapeutics. DescribePROT can be freely accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.


Assuntos
Aminoácidos , Proteoma , Proteoma/química , Bases de Dados Factuais
4.
bioRxiv ; 2024 Jan 28.
Artigo em Inglês | MEDLINE | ID: mdl-37609352

RESUMO

Large protein language models (PLMs) present excellent potential to reshape protein research by encoding the amino acid sequences into mathematical and biological meaningful embeddings. However, the lack of crucial 3D structure information in most PLMs restricts the prediction capacity of PLMs in various applications, especially those heavily depending on 3D structures. To address this issue, we introduce S-PLM, a 3D structure-aware PLM utilizing multi-view contrastive learning to align the sequence and 3D structure of a protein in a coordinate space. S-PLM applies Swin-Transformer on AlphaFold-predicted protein structures to embed the structural information and fuses it into sequence-based embedding from ESM2. Additionally, we provide a library of lightweight tuning tools to adapt S-PLM for diverse protein property prediction tasks. Our results demonstrate S-PLM's superior performance over sequence-only PLMs, achieving competitiveness in protein function prediction compared to state-of-the-art methods employing both sequence and structure inputs.

5.
Molecules ; 28(19)2023 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-37836636

RESUMO

Interactions between proteins and ions are essential for various biological functions like structural stability, metabolism, and signal transport. Given that more than half of all proteins bind to ions, it is becoming crucial to identify ion-binding sites. The accurate identification of protein-ion binding sites helps us to understand proteins' biological functions and plays a significant role in drug discovery. While several computational approaches have been proposed, this remains a challenging problem due to the small size and high versatility of metals and acid radicals. In this study, we propose IonPred, a sequence-based approach that employs ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately) to predict ion-binding sites using only raw protein sequences. We successfully fine-tuned our pretrained model to predict the binding sites for nine metal ions (Zn2+, Cu2+, Fe2+, Fe3+, Ca2+, Mg2+, Mn2+, Na+, and K+) and four acid radical ion ligands (CO32-, SO42-, PO43-, NO2-). IonPred surpassed six current state-of-the-art tools by over 44.65% and 28.46%, respectively, in the F1 score and MCC when compared on an independent test dataset. Our method is more computationally efficient than existing tools, producing prediction results for a hundred sequences for a specific ion in under ten minutes.


Assuntos
Metais , Proteínas , Ligantes , Proteínas/química , Sítios de Ligação , Ligação Proteica , Metais/química , Íons/química
6.
Am J Cardiol ; 204: 207-214, 2023 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-37556889

RESUMO

Because the 6-minute walking test (6MWT) is a self-paced submaximal test, the 6-minute walking distance (6MWD) is substantially influenced by individual effort level and physical condition, which is difficult to quantify. We aimed to explore the optimal indicator reflecting the perceived effort level during 6MWT. We prospectively enrolled 76 patients with pulmonary arterial hypertension and 152 healthy participants; they performed 2 6MWTs at 2 different speeds: (1) at leisurely speed, as performed in daily life without extra effort (leisure 6MWT) and (2) an increased walking speed, walking as the guideline indicated (standard 6MWT). The factors associated with 6MWD during standard 6MWT were investigated using a multiple linear regression analysis. The heart rate (HR) and Borg score increased and oxygen saturation (SpO2) decreased after walking in 2 6MWTs in both groups (all p <0.001). The ratio of difference in HR before and after each test (ΔHR) to HR before walking (HRat rest) and the difference in SpO2 (ΔSpO2) and Borg (ΔBorg) before and after each test were all significantly higher in both groups after standard 6MWT than after leisure 6MWT (all p <0.001). Multiple linear regression analysis revealed that ΔHR/HRat rest was an independent predictor of 6MWD during standard 6MWT in both groups (both p <0.001, adjusted R2 = 0.737 and 0.49, respectively). 6MWD and ΔHR/HRat rest were significantly lower in patients than in healthy participants (both p <0.001) and in patients with cardiac functional class III than in patients with class I/II (both p <0.001). In conclusion, ΔHR/HRat rest is a good reflector of combined physical and effort factors. HR response should be incorporated into 6MWD to better assess a participant's exercise capacity.


Assuntos
Hipertensão Arterial Pulmonar , Humanos , Frequência Cardíaca , Teste de Caminhada , Caminhada/fisiologia , Análise de Regressão , Teste de Esforço , Tolerância ao Exercício
7.
Nucleic Acids Res ; 51(W1): W343-W349, 2023 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-37178004

RESUMO

Predicting protein localization and understanding its mechanisms are critical in biology and pathology. In this context, we propose a new web application of MULocDeep with improved performance, result interpretation, and visualization. By transferring the original model into species-specific models, MULocDeep achieved competitive prediction performance at the subcellular level against other state-of-the-art methods. It uniquely provides a comprehensive localization prediction at the suborganellar level. Besides prediction, our web service quantifies the contribution of single amino acids to localization for individual proteins; for a group of proteins, common motifs or potential targeting-related regions can be derived. Furthermore, the visualizations of targeting mechanism analyses can be downloaded for publication-ready figures. The MULocDeep web service is available at https://www.mu-loc.org/.


Assuntos
Proteínas , Software , Aminoácidos/metabolismo , Biologia Computacional/métodos , Transporte Proteico , Proteínas/química , Internet
8.
Nat Commun ; 14(1): 964, 2023 02 21.
Artigo em Inglês | MEDLINE | ID: mdl-36810839

RESUMO

Single-cell multi-omics (scMulti-omics) allows the quantification of multiple modalities simultaneously to capture the intricacy of complex molecular mechanisms and cellular heterogeneity. Existing tools cannot effectively infer the active biological networks in diverse cell types and the response of these networks to external stimuli. Here we present DeepMAPS for biological network inference from scMulti-omics. It models scMulti-omics in a heterogeneous graph and learns relations among cells and genes within both local and global contexts in a robust manner using a multi-head graph transformer. Benchmarking results indicate DeepMAPS performs better than existing tools in cell clustering and biological network construction. It also showcases competitive capability in deriving cell-type-specific biological networks in lung tumor leukocyte CITE-seq data and matched diffuse small lymphocytic lymphoma scRNA-seq and scATAC-seq data. In addition, we deploy a DeepMAPS webserver equipped with multiple functionalities and visualizations to improve the usability and reproducibility of scMulti-omics data analysis.


Assuntos
Benchmarking , Análise de Dados , Reprodutibilidade dos Testes , Análise por Conglomerados , Fontes de Energia Elétrica , Análise de Célula Única
9.
Nat Commun ; 14(1): 812, 2023 02 13.
Artigo em Inglês | MEDLINE | ID: mdl-36781861

RESUMO

Unlike PIWI-interacting RNA (piRNA) in other species that mostly target transposable elements (TEs), >80% of piRNAs in adult mammalian testes lack obvious targets. However, mammalian piRNA sequences and piRNA-producing loci evolve more rapidly than the rest of the genome for unknown reasons. Here, through comparative studies of chickens, ducks, mice, and humans, as well as long-read nanopore sequencing on diverse chicken breeds, we find that piRNA loci across amniotes experience: (1) a high local mutation rate of structural variations (SVs, mutations ≥ 50 bp in size); (2) positive selection to suppress young and actively mobilizing TEs commencing at the pachytene stage of meiosis during germ cell development; and (3) negative selection to purge deleterious SV hotspots. Our results indicate that genetic instability at pachytene piRNA loci, while producing certain pathogenic SVs, also protects genome integrity against TE mobilization by driving the formation of rapid-evolving piRNA sequences.


Assuntos
Galinhas , Células Germinativas , Humanos , Masculino , Animais , Camundongos , RNA Interferente Pequeno/genética , RNA Interferente Pequeno/metabolismo , Galinhas/genética , Galinhas/metabolismo , Células Germinativas/metabolismo , Testículo/metabolismo , Elementos de DNA Transponíveis/genética , RNA de Interação com Piwi , Mamíferos/genética
10.
Nat Mach Intell ; 5(4): 337-339, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38260002

RESUMO

Predicting whether T-cell receptors bind to specific peptides is a challenging problem as the majority of binding examples in the training data involves only a few peptides. A new approach employs meta-learning to improve predictions for binding to peptides for which no or little binding data exists.

11.
Methods Mol Biol ; 2499: 105-124, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35696076

RESUMO

Phosphorylation plays a vital role in signal transduction and cell cycle. Identifying and understanding phosphorylation through machine-learning methods has a long history. However, existing methods only learn representations of a protein sequence segment from a labeled dataset itself, which could result in biased or incomplete features, especially for kinase-specific phosphorylation site prediction in which training data are typically sparse. To learn a comprehensive contextual representation of a protein sequence segment for kinase-specific phosphorylation site prediction, we pretrained our model from over 24 million unlabeled sequence fragments using ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately). The pretrained model was applied to kinase-specific site prediction of kinases CDK, PKA, CK2, MAPK, and PKC. The pretrained ELECTRA model achieves 9.02% improvement over BERT and 11.10% improvement over MusiteDeep in the area under the precision-recall curve on the benchmark data.


Assuntos
Aprendizado de Máquina , Proteínas Quinases , Fosforilação , Proteínas Quinases/metabolismo
12.
Nucleic Acids Res ; 50(D1): D333-D339, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34551440

RESUMO

Resolving the spatial distribution of the transcriptome at a subcellular level can increase our understanding of biology and diseases. To facilitate studies of biological functions and molecular mechanisms in the transcriptome, we updated RNALocate, a resource for RNA subcellular localization analysis that is freely accessible at http://www.rnalocate.org/ or http://www.rna-society.org/rnalocate/. Compared to RNALocate v1.0, the new features in version 2.0 include (i) expansion of the data sources and the coverage of species; (ii) incorporation and integration of RNA-seq datasets containing information about subcellular localization; (iii) addition and reorganization of RNA information (RNA subcellular localization conditions and descriptive figures for method, RNA homology information, RNA interaction and ncRNA disease information) and (iv) three additional prediction tools: DM3Loc, iLoc-lncRNA and iLoc-mRNA. Overall, RNALocate v2.0 provides a comprehensive RNA subcellular localization resource for researchers to deconvolute the highly complex architecture of the cell.


Assuntos
Bases de Dados de Ácidos Nucleicos , RNA não Traduzido/genética , Software , Transcriptoma , Animais , Sequência de Bases , Compartimento Celular , Conjuntos de Dados como Assunto , Drosophila melanogaster/genética , Drosophila melanogaster/metabolismo , Células Eucarióticas/citologia , Células Eucarióticas/metabolismo , Regulação da Expressão Gênica , Ontologia Genética , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , RNA não Traduzido/classificação , RNA não Traduzido/metabolismo , Ratos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Alinhamento de Sequência , Homologia de Sequência do Ácido Nucleico , Frações Subcelulares/química , Frações Subcelulares/metabolismo , Peixe-Zebra/genética , Peixe-Zebra/metabolismo
13.
Comput Struct Biotechnol J ; 19: 5834-5844, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34765098

RESUMO

The accurate annotation of protein localization is crucial in understanding protein function in tandem with a broad range of applications such as pathological analysis and drug design. Since most proteins do not have experimentally-determined localization information, the computational prediction of protein localization has been an active research area for more than two decades. In particular, recent machine-learning advancements have fueled the development of new methods in protein localization prediction. In this review paper, we first categorize the main features and algorithms used for protein localization prediction. Then, we summarize a list of protein localization prediction tools in terms of their coverage, characteristics, and accessibility to help users find suitable tools based on their needs. Next, we evaluate some of these tools on a benchmark dataset. Finally, we provide an outlook on the future exploration of protein localization methods.

14.
Comput Struct Biotechnol J ; 19: 4825-4839, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34522290

RESUMO

Prediction of protein localization plays an important role in understanding protein function and mechanisms. In this paper, we propose a general deep learning-based localization prediction framework, MULocDeep, which can predict multiple localizations of a protein at both subcellular and suborganellar levels. We collected a dataset with 44 suborganellar localization annotations in 10 major subcellular compartments-the most comprehensive suborganelle localization dataset to date. We also experimentally generated an independent dataset of mitochondrial proteins in Arabidopsis thaliana cell cultures, Solanum tuberosum tubers, and Vicia faba roots and made this dataset publicly available. Evaluations using the above datasets show that overall, MULocDeep outperforms other major methods at both subcellular and suborganellar levels. Furthermore, MULocDeep assesses each amino acid's contribution to localization, which provides insights into the mechanism of protein sorting and localization motifs. A web server can be accessed at http://mu-loc.org.

15.
Nucleic Acids Res ; 49(W1): W228-W236, 2021 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-34037802

RESUMO

G2PDeep is an open-access web server, which provides a deep-learning framework for quantitative phenotype prediction and discovery of genomics markers. It uses zygosity or single nucleotide polymorphism (SNP) information from plants and animals as the input to predict quantitative phenotype of interest and genomic markers associated with phenotype. It provides a one-stop-shop platform for researchers to create deep-learning models through an interactive web interface and train these models with uploaded data, using high-performance computing resources plugged at the backend. G2PDeep also provides a series of informative interfaces to monitor the training process and compare the performance among the trained models. The trained models can then be deployed automatically. The quantitative phenotype and genomic markers are predicted using a user-selected trained model and the results are visualized. Our state-of-the-art model has been benchmarked and demonstrated competitive performance in quantitative phenotype predictions by other researchers. In addition, the server integrates the soybean nested association mapping (SoyNAM) dataset with five phenotypes, including grain yield, height, moisture, oil, and protein. A publicly available dataset for seed protein and oil content has also been integrated into the server. The G2PDeep server is publicly available at http://g2pdeep.org. The Python-based deep-learning model is available at https://github.com/shuaizengMU/G2PDeep_model.


Assuntos
Marcadores Genéticos , Fenótipo , Software , Aprendizado Profundo , Genômica , Internet , Polimorfismo de Nucleotídeo Único , /genética
16.
Nucleic Acids Res ; 49(8): e46, 2021 05 07.
Artigo em Inglês | MEDLINE | ID: mdl-33503258

RESUMO

Subcellular localization of messenger RNAs (mRNAs), as a prevalent mechanism, gives precise and efficient control for the translation process. There is mounting evidence for the important roles of this process in a variety of cellular events. Computational methods for mRNA subcellular localization prediction provide a useful approach for studying mRNA functions. However, few computational methods were designed for mRNA subcellular localization prediction and their performance have room for improvement. Especially, there is still no available tool to predict for mRNAs that have multiple localization annotations. In this paper, we propose a multi-head self-attention method, DM3Loc, for multi-label mRNA subcellular localization prediction. Evaluation results show that DM3Loc outperforms existing methods and tools in general. Furthermore, DM3Loc has the interpretation ability to analyze RNA-binding protein motifs and key signals on mRNAs for subcellular localization. Our analyses found hundreds of instances of mRNA isoform-specific subcellular localizations and many significantly enriched gene functions for mRNAs in different subcellular localizations.


Assuntos
Biologia Computacional/métodos , Redes Neurais de Computação , RNA Mensageiro/metabolismo , Frações Subcelulares/metabolismo , Membrana Celular/genética , Membrana Celular/metabolismo , Núcleo Celular/genética , Núcleo Celular/metabolismo , Citosol/metabolismo , Bases de Dados Genéticas , Bases de Dados de Proteínas , Retículo Endoplasmático/genética , Retículo Endoplasmático/metabolismo , Exossomos/genética , Exossomos/metabolismo , Ontologia Genética , Humanos , Proteômica , RNA Mensageiro/genética , Ribossomos/genética , Ribossomos/metabolismo , Transcriptoma/genética
17.
Comput Struct Biotechnol J ; 18: 1877-1883, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32774783

RESUMO

Pseudouridine synthase binds to uridine sites and catalyzes the conversion of uridine to pseudouridine (Ψ). This binding takes place in a specific context and in the conformation of nucleotides. Most machine-learning methods for Ψ site classification use nucleotide frequency as a feature, which may not fully depict the relevant conformation around a Ψ site. Using the power of deep learning and raw sequence, as well as secondary structure features, our tool MU-PseUDeep is designed to capture both the sequence and secondary structure context, which inputs the raw RNA sequence and the predicted secondary structure to two sets of convolutional neural networks. It has shown considerable improvement in Ψ site prediction over existing tools, XG-PseU, PseUI, and iRNA-PseU for both balanced and imbalanced datasets. To the best of our knowledge, this is the most accurate tool for Ψ site prediction. We also used MU-PseUDeep to scan the human transcriptome, which shows that the genes with predicted Ψ sites are enriched in nucleotide and protein binding, as well as in neurodegeneration pathways. The tool is open source, available at https://github.com/smk5g5/MU-PseUDeep.

18.
Nucleic Acids Res ; 48(W1): W140-W146, 2020 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-32324217

RESUMO

MusiteDeep is an online resource providing a deep-learning framework for protein post-translational modification (PTM) site prediction and visualization. The predictor only uses protein sequences as input and no complex features are needed, which results in a real-time prediction for a large number of proteins. It takes less than three minutes to predict for 1000 sequences per PTM type. The output is presented at the amino acid level for the user-selected PTM types. The framework has been benchmarked and has demonstrated competitive performance in PTM site predictions by other researchers. In this webserver, we updated the previous framework by utilizing more advanced ensemble techniques, and providing prediction and visualization for multiple PTMs simultaneously for users to analyze potential PTM cross-talks directly. Besides prediction, users can interactively review the predicted PTM sites in the context of known PTM annotations and protein 3D structures through homology-based search. In addition, the server maintains a local database providing pre-processed PTM annotations from Uniport/Swiss-Prot for users to download. This database will be updated every three months. The MusiteDeep server is available at https://www.musite.net. The stand-alone tools for locally using MusiteDeep are available at https://github.com/duolinwang/MusiteDeep_web.


Assuntos
Aprendizado Profundo , Processamento de Proteína Pós-Traducional , Software , Gráficos por Computador , Internet , Modelos Moleculares , Conformação Proteica , Proteínas/química , Análise de Sequência de Proteína
19.
Bioinformatics ; 36(1): 169-176, 2020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31168616

RESUMO

MOTIVATION: As large amounts of biological data continue to be rapidly generated, a major focus of bioinformatics research has been aimed toward integrating these data to identify active pathways or modules under certain experimental conditions or phenotypes. Although biologically significant modules can often be detected globally by many existing methods, it is often hard to interpret or make use of the results toward pathway model generation and testing. RESULTS: To address this gap, we have developed the IMPRes algorithm, a new step-wise active pathway detection method using a dynamic programing approach. IMPRes takes advantage of the existing pathway interaction knowledge in Kyoto Encyclopedia of Genes and Genomes. Omics data are then used to assign penalties to genes, interactions and pathways. Finally, starting from one or multiple seed genes, a shortest path algorithm is applied to detect downstream pathways that best explain the gene expression data. Since dynamic programing enables the detection one step at a time, it is easy for researchers to trace the pathways, which may lead to more accurate drug design and more effective treatment strategies. The evaluation experiments conducted on three yeast datasets have shown that IMPRes can achieve competitive or better performance than other state-of-the-art methods. Furthermore, a case study on human lung cancer dataset was performed and we provided several insights on genes and mechanisms involved in lung cancer, which had not been discovered before. AVAILABILITY AND IMPLEMENTATION: IMPRes visualization tool is available via web server at http://digbio.missouri.edu/impres. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Perfilação da Expressão Gênica , Modelos Genéticos , Software , Algoritmos , Perfilação da Expressão Gênica/métodos , Humanos
20.
Methods ; 173: 16-23, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-31220603

RESUMO

Nowadays, large amounts of omics data have been generated and contributed to increasing knowledge about associated biological mechanisms. A new challenge coming along is how to identify the active pathways and extract useful insights from these data with huge background information and noise. Although biologically meaningful modules can often be detected by many existing informatics tools, it is still hard to interpret or make use of the results towards in silico hypothesis generation and testing. To address this gap, we previously developed the IMPRes (Integrative MultiOmics Pathway Resolution) v 1.0 algorithm, a new step-wise active pathway detection method using a dynamic programming approach. This approach enables the network detection one step at a time, making it easy for researchers to trace the pathways, and leading to more accurate drug design and more effective treatment strategies. In this paper, we present IMPRes-Pro, an enhancement to IMPRes v1.0 by integrating proteomics data along with transcriptomics data and constructing a heterogeneous background network. The evaluation experiment conducted on human primary breast cancer dataset has shown the advantage over the original IMPRes v1.0 method. Furthermore, a case study on human metastatic breast cancer dataset was performed and we have provided several insights regarding the selection of optimal therapy strategy. IMPRes-Pro algorithm and visualization tool is available as a web service at http://digbio.missouri.edu/impres.


Assuntos
Neoplasias da Mama/genética , Biologia Computacional/métodos , Proteômica/métodos , Software , Algoritmos , Neoplasias da Mama/patologia , Gráficos por Computador , Simulação por Computador , Feminino , Perfilação da Expressão Gênica/métodos , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...